Fuzhou
TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning
Ni, Hang, Liu, Fan, Ma, Xinyu, Su, Lixin, Wang, Shuaiqiang, Yin, Dawei, Xiong, Hui, Liu, Hao
Large language models (LLMs) have shown promise in automating travel planning, yet they often fall short in addressing nuanced spatiotemporal rationality. While existing benchmarks focus on basic plan validity, they neglect critical aspects such as route efficiency, POI appeal, and real-time adaptability. This paper introduces TP-RAG, the first benchmark tailored for retrieval-augmented, spatiotemporal-aware travel planning. Our dataset includes 2,348 real-world travel queries, 85,575 fine-grain annotated POIs, and 18,784 high-quality travel trajectory references sourced from online tourist documents, enabling dynamic and context-aware planning. Through extensive experiments, we reveal that integrating reference trajectories significantly improves spatial efficiency and POI rationality of the travel plan, while challenges persist in universality and robustness due to conflicting references and noisy data. To address these issues, we propose EvoRAG, an evolutionary framework that potently synergizes diverse retrieved trajectories with LLMs' intrinsic reasoning. EvoRAG achieves state-of-the-art performance, improving spatiotemporal compliance and reducing commonsense violation compared to ground-up and retrieval-augmented baselines. Our work underscores the potential of hybridizing Web knowledge with LLM-driven optimization, paving the way for more reliable and adaptive travel planning agents.
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Yunnan Province > Kunming (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (13 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
A Novel Wasserstein Quaternion Generative Adversarial Network for Color Image Generation
Jia, Zhigang, Wang, Duan, Wang, Hengkai, Xie, Yajun, Zhao, Meixiang, Zhao, Xiaoyu
Color image generation has a wide range of applications, but the existing generation models ignore the correlation among color channels, which may lead to chromatic aberration problems. In addition, the data distribution problem of color images has not been systematically elaborated and explained, so that there is still the lack of the theory about measuring different color images datasets. In this paper, we define a new quaternion Wasserstein distance and develop its dual theory. To deal with the quaternion linear programming problem, we derive the strong duality form with helps of quaternion convex set separation theorem and quaternion Farkas lemma. With using quaternion Wasserstein distance, we propose a novel Wasserstein quaternion generative adversarial network. Experiments demonstrate that this novel model surpasses both the (quaternion) generative adversarial networks and the Wasserstein generative adversarial network in terms of generation efficiency and image quality.
- Asia > Singapore (0.14)
- Asia > China > Jiangsu Province > Xuzhou (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (4 more...)
GeoMAE: Masking Representation Learning for Spatio-Temporal Graph Forecasting with Missing Values
Ke, Songyu, Wu, Chenyu, Liang, Yuxuan, Qin, Huiling, Zhang, Junbo, Zheng, Yu
The ubiquity of missing data in urban intelligence systems, attributable to adverse environmental conditions and equipment failures, poses a significant challenge to the efficacy of downstream applications, notably in the realms of traffic forecasting and energy consumption prediction. Therefore, it is imperative to develop a robust spatio-temporal learning methodology capable of extracting meaningful insights from incomplete datasets. Despite the existence of methodologies for spatio-temporal graph forecasting in the presence of missing values, unresolved issues persist. Primarily, the majority of extant research is predicated on time-series analysis, thereby neglecting the dynamic spatial correlations inherent in sensor networks. Junbo Zhang is the corresponding author. This research was done when the first author was an intern at JD Intelligent Cities Research & JD iCity under the supervision of the fifth author. The model is comprised of three principal components: an input preprocessing module, an attention-based spatio-temporal forecasting network (STAFN), and an auxiliary learning task, which draws inspiration from Masking AutoEncoders to enhance the robustness of spatio-temporal representation learning. Empirical evaluations on real-world datasets demonstrate that GeoMAE significantly outperforms existing benchmarks, achieving up to 13.20% relative improvement over the best baseline models. Introduction Spatio-temporal representation learning has emerged as a pivotal research area, underpinning various intelligent applications in smart cities that play crucial roles across multiple domains. For instance, precise weather forecasting can significantly mitigate the detrimental impacts of natural disasters through early prevention; advanced traffic prediction systems help optimize traffic flow and substantially reduce congestion; environmental monitoring enables rapid identification of pollution hotspots within urban environments.
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (3 more...)
MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction
Tang, Shuo, Xu, Jian, Zhang, Jiadong, Chen, Yi, Jin, Qizhao, Shen, Lingdong, Liu, Chenglin, Xiang, Shiming
Timely and accurate forecasts of severe weather events are essential for early warning and for constraining downstream analysis and decision-making. Since severe weather events prediction still depends on subjective, time-consuming expert interpretation, end-to-end "AI weather station" systems are emerging but face three major challenges: (1) scarcity of severe weather event samples; (2) imperfect alignment between high-dimensional meteorological data and textual warnings; (3) current multimodal language models cannot effectively process high-dimensional meteorological inputs or capture their complex spatiotemporal dependencies. T o address these challenges, we introduce MP-Bench, the first large-scale multimodal dataset for severe weather events prediction, comprising 421,363 pairs of raw multi-year meteorological data and corresponding text caption, covering a wide range of severe weather scenarios. On top of this dataset, we develop a Meteorology Multimodal Large Model (MMLM) that directly ingests 4D meteorological inputs. In addition, it is designed to accommodate the unique characteristics of 4D meteorological data flow, incorporating three plug-and-play adaptive fusion modules that enable dynamic feature extraction and integration across temporal sequences, vertical pressure layers, and spatial dimensions. Extensive experiments on MP-Bench show that MMLM achieves strong performance across multiple tasks, demonstrating effective severe weather understanding and representing a key step toward automated, AI-driven severe weather events forecasting systems. Our source code and dataset will be made publicly available.
- North America > United States (0.28)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- Energy (0.68)
- Food & Agriculture > Agriculture (0.46)
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Zhang, Xiaoyang, Li, jinjiang, Fan, Guodong, Ju, Yakun, Fan, Linwei, Liu, Jun, Kot, Alex C.
Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- Asia > China > Shandong Province > Yantai (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (2 more...)
Skypilot: Fine-Tuning LLM with Physical Grounding for AAV Coverage Search
Chen, Zhongkai, Sun, Yihao, Yan, Chao, Zhou, Han, Xiang, Xiaojia, Jiang, Jie
Autonomous aerial vehicles (AAVs) have played a pivotal role in coverage operations and search missions. Recent advances in large language models (LLMs) offer promising opportunities to augment AAV intelligence. These advances help address complex challenges like area coverage optimization, dynamic path planning, and adaptive decision-making. However, the absence of physical grounding in LLMs leads to hallucination and reproducibility problems in spatial reasoning and decision-making. To tackle these issues, we present Skypilot, an LLM-enhanced two-stage framework that grounds language models in physical reality by integrating monte carlo tree search (MCTS). In the first stage, we introduce a diversified action space that encompasses generate, regenerate, fine-tune, and evaluate operations, coupled with physics-informed reward functions to ensure trajectory feasibility. In the second stage, we fine-tune Qwen3-4B on 23,000 MCTS-generated samples, achieving substantial inference acceleration while maintaining solution quality. Extensive numerical simulations and real-world flight experiments validate the efficiency and superiority of our proposed approach. Detailed information and experimental results are accessible at https://sky-pilot.top.
- Asia > China > Hunan Province (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Asia > China > Fujian Province > Fuzhou (0.05)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > Greece (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Promising Solution (0.92)
- Law (1.00)
- Information Technology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Rényi Differential Privacy for Heavy-Tailed SDEs via Fractional Poincaré Inequalities
Dupuis, Benjamin, Gürbüzbalaban, Mert, Şimşekli, Umut, Wang, Jian, Yildirim, Sinan, Zhu, Lingjiong
Characterizing the differential privacy (DP) of learning algorithms has become a major challenge in recent years. In parallel, many studies suggested investigating the behavior of stochastic gradient descent (SGD) with heavy-tailed noise, both as a model for modern deep learning models and to improve their performance. However, most DP bounds focus on light-tailed noise, where satisfactory guarantees have been obtained but the proposed techniques do not directly extend to the heavy-tailed setting. Recently, the first DP guarantees for heavy-tailed SGD were obtained. These results provide $(0,δ)$-DP guarantees without requiring gradient clipping. Despite casting new light on the link between DP and heavy-tailed algorithms, these results have a strong dependence on the number of parameters and cannot be extended to other DP notions like the well-established Rényi differential privacy (RDP). In this work, we propose to address these limitations by deriving the first RDP guarantees for heavy-tailed SDEs, as well as their discretized counterparts. Our framework is based on new Rényi flow computations and the use of well-established fractional Poincaré inequalities. Under the assumption that such inequalities are satisfied, we obtain DP guarantees that have a much weaker dependence on the dimension compared to prior art.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance (1.00)